Phase 3: Make script rewriters fragment-safe for streaming#591
Open
aram356 wants to merge 6 commits intofeature/streaming-pipeline-phase2from
Open
Phase 3: Make script rewriters fragment-safe for streaming#591aram356 wants to merge 6 commits intofeature/streaming-pipeline-phase2from
aram356 wants to merge 6 commits intofeature/streaming-pipeline-phase2from
Conversation
Accumulate text fragments via Mutex<String> until last_in_text_node is true, then process the complete text. Intermediate fragments return RemoveNode to suppress output.
Accumulate text fragments via Mutex<String> until last_in_text_node is true, then match and rewrite on the complete text. Non-GTM scripts that were fragmented are emitted unchanged.
All script rewriters (NextJS __NEXT_DATA__, GTM) are now fragment-safe — they accumulate text internally until last_in_text_node. The buffered adapter workaround is no longer needed. Always use streaming mode in create_html_processor.
When rewrite_structured returns Keep on accumulated content, intermediate fragments were already removed via RemoveNode. Emit the full accumulated content via Replace to prevent silent data loss. Also updates spec to reflect Phase 3 completion.
- Add response.get_status().is_success() check to streaming gate so 4xx/5xx error pages stay buffered with complete status codes - Add streaming gate unit tests covering all gate conditions - Add stream_publisher_body gzip round-trip test - Add small-chunk (32 byte) pipeline tests for __NEXT_DATA__ and GTM that prove fragmented text nodes survive the real lol_html path
This was
linked to
issues
Mar 27, 2026
17 tasks
Phase 3 performance results: 35% TTFB improvement, 37% DOM Complete improvement on getpurpose.ai staging vs production. Phase 4 adds binary pass-through streaming via PublisherResponse::PassThrough.
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Make all script rewriters fragment-safe so streaming works even with GTM and NextJS active. This removes the buffered fallback introduced in Phase 1, enabling full streaming for all configurations. Also adds the 2xx streaming gate, publisher-level tests, and small-chunk pipeline regression tests.
Closes #586, closes #587, closes #588, closes #589, closes #590.
Part of epic #563. Depends on Phase 2 (#585).
Performance results (staging vs production, median over 5 runs, Chrome 1440x900)
Production (v135) buffers the entire response body before sending any bytes to the client. Staging (v136) streams processed chunks incrementally via
StreamingBody. The 35% TTFB improvement cascades into earlier paint metrics, and DOM Complete sees the largest absolute gain (-397ms) because the browser can parse/render while still receiving the body.Metric definitions
Problem
lol_html fragments text nodes across input chunk boundaries. When the streaming
HtmlRewriterAdapterfeeds chunks incrementally, a text node like"googletagmanager.com/gtm.js"can be split into"google"and"tagmanager.com/gtm.js"— neither fragment matches the full domain string, so the rewrite silently fails.Phase 1 worked around this with a buffered adapter mode. Phase 3 fixes the root cause.
Solution
Each script rewriter now accumulates text fragments via
Mutex<String>untillast_in_text_node()is true, then processes the complete text:RemoveNode(suppress output, accumulate)Replace(rewritten)orKeepWhat changed
script_rewriter.rsNextJsNextDataRewriteraccumulates fragmentsgoogle_tag_manager.rsGoogleTagManagerIntegrationaccumulates fragmentsstreaming_processor.rsnew_buffered(),bufferedflag,accumulated_input, buffered testhtml_processor.rshas_script_rewriterscheck, always use streaming adapterpublisher.rsstream_publisher_bodygzip testnextjs/mod.rs__NEXT_DATA__pipeline regression testgoogle_tag_manager.rsTests added
fragmented_next_data_is_accumulated_and_rewritten— splits__NEXT_DATA__mid-URLunfragmented_next_data_works_without_accumulation— fast path still worksfragmented_next_data_without_rewritable_urls_preserves_content— Keep-after-accumulation bugfragmented_gtm_snippet_is_accumulated_and_rewritten— splits GTM domain mid-stringnon_gtm_fragmented_script_is_passed_through— non-GTM scripts emitted unchangedsmall_chunk_next_data_rewrite_survives_fragmentation— 32-byte chunks through full HTML pipelinesmall_chunk_gtm_rewrite_survives_fragmentation— 32-byte chunks through full HTML pipelinestreaming_gate_allows_2xx_html_without_post_processors— gate unit teststreaming_gate_blocks_non_2xx_responses— 4xx/5xx stays bufferedstreaming_gate_blocks_html_with_post_processors— post-processors force bufferingstreaming_gate_allows_non_html_with_post_processors— non-HTML streams regardlessstreaming_gate_blocks_non_2xx_json— error JSON stays bufferedstream_publisher_body_preserves_gzip_round_trip— public API gzip testVerification
cargo test --workspace— 766 passed, 0 failedcargo clippy --workspace --all-targets --all-features -- -D warnings— cleancargo fmt --all -- --check— cleancargo build --release --target wasm32-wasip1— successTest plan
__NEXT_DATA__test passes